pymldb
's Progress Bar and Cancel Button TutorialThis tutorial showcases the use of progress bars and cancel buttons for long-running procedures with pymldb
with a Jupyter notebook. This allows a user to see the progress of a procedure as well as cancel it.
If you have not done so already, we encourage you to go through the Using pymldb
Tutorial.
To use this feature, you only need to slightly modify the way you execute procedures. For example, when doing an HTTP PUT, you would go from using mldb.put()
to mldb.put_and_track()
.
The cancel button is displayed as soon as the procedure run id is found. The button is removed as soon as the procedure finishes either normally or with an error.
The progress bar library used is tqdm/tqdm. Progress bars are displayed as soon as a procedure enters the "executing" state. Then they are refreshed at every interval for as long as the procedure stays in the "executing" state. They move to a valid state (they turn green) when a step/procedure finishes normally and to a danger state (they turn red) when they finish with an error.
If a procedure runs too quickly, the progress bars will not be displayed because the application logic will not have time to catch the "executing" phase. If a procedure stays in the "initializing" phase for some time, the "Cancel" button will be visible with no progress bars as long as the "executing" phase is not reached.
Here we start with the obligatory lines to import pymldb and initialize the connection to MLDB.
In [13]:
import pymldb
mldb = pymldb.Connection()
In [8]:
print mldb.post_and_track('/v1/procedures', {
'type' : 'mock',
'params' : {'durationMs' : 8000, "refreshRateMs" : 500}
}, 0.5)
In [9]:
print mldb.put_and_track('/v1/procedures/embedded_imagess', {
'type' : 'import.text',
'params' : {
'dataFileUrl' : 'https://s3.amazonaws.com/benchm-ml--main/train-1m.csv',
'outputDataset' : {
'id' : 'embedded_images_realestate',
'type' : 'sparse.mutable'
}
}
}, 0.1)
In [11]:
prefix = 'http://public.mldb.ai/datasets/dataset-builder'
print mldb.post_and_track('/v1/procedures', {
'type' : 'serial',
'params' : {
'steps' : [
{
'type' : 'mock',
'params' : {'durationMs' : 2000, "refreshRateMs" : 500}
}, {
'type' : 'import.text',
'params' : {
'dataFileUrl' : prefix + '/cache/dataset_creator_embedding_realestate.csv.gz',
'outputDataset' : {
'id' : 'embedded_images_realestate',
'type' : 'embedding'
},
'select' : '* EXCLUDING(rowName)',
'named' : 'rowName',
}
}, {
'type' : 'mock',
'params' : {'durationMs' : 2000, "refreshRateMs" : 500}
}
]
}
})
Check out the other Tutorials and Demos.